Method of demand scrubbing by placing corrected data in memory-side cache

ABSTRACT

Systems, apparatuses, and methods related to chiplets are described. A chiplet-based system may include a memory controller chiplet to control accesses to a storage array, and the memory controller chiplet can facilitate error correction and cache management in a manner to minimize interruptions to a sequence of data reads to write corrected data from a prior read back into the storage array. For example, a read command may be received at a memory controller device of the memory system from a requesting device. Data responsive to the read command may be obtained and determined to include a correctable error. The data may be corrected, transmitted to the requesting device and written to cache of the memory controller device with an indication that data is valid and dirty (e.g., includes an error or corrected error). The data is written back to the memory array in response to a cache eviction event.

PRIORITY APPLICATION

This application is a continuation of U.S. application Ser. No.17/007,811, filed Aug. 31, 2020, which is incorporated herein byreference in its entirety.

STATEMENT REGARDING GOVERNMENT SUPPORT

This invention was made with U.S. Government support under Agreement No.HR00111830003, awarded by DARPA. The U.S. Government has certain rightsin the invention.

TECHNICAL FIELD

Embodiments of the disclosure relate generally to chiplet-based systems,and more specifically to systems and methods to operate a memorycontroller as may be implemented in one or more chiplets of achiplet-based system.

BACKGROUND

Chiplets are an emerging technique for integrating various processingfunctionality. Generally, a chiplet system is made up of discreet chips(e.g., integrated circuits (ICs) on different substrate or die) that areintegrated on an interposer and packaged together. This arrangement isdistinct from single chips (e.g., ICs) that contain distinct deviceblocks (e.g., intellectual property (IP) blocks) on one substrate (e.g.,single die), such as a system-on-a-chip (SoC), or discreetly packageddevices integrated on a board. In general, chiplets provide betterperformance (e.g., lower power consumption, reduced latency, etc.) thandiscreetly packaged devices, and chiplets provide greater productionbenefits than single die chips. These production benefits can includehigher yields or reduced development costs and time.

Chiplet systems are generally made up of one or more applicationchiplets and support chiplets. Here, the distinction between applicationand support chiplets is simply a reference to the likely designscenarios for the chiplet system. Thus, for example, a synthetic visionchiplet system can include an application chiplet to produce thesynthetic vision output along with support chiplets, such as a memorycontroller chiplet, sensor interface chiplet, or communication chiplet.In an example use case, the synthetic vision designer can design theapplication chiplet and source the support chiplets from other parties.Thus, the design expenditure (e.g., in terms of time or complexity) isreduced because by avoiding the design and production of functionalityembodied in the support chiplets. Chiplets also support the tightintegration of IP blocks that can otherwise be difficult, such as thoseusing different feature sizes. Thus, for example, devices designedduring a previous fabrication generation with larger feature sizes, orthose devices in which the feature size is optimized for the power,speed, or heat generation (e.g., for sensor applications) can beintegrated with devices having different feature sizes more easily thanattempting to do so on a single die. Additionally, by reducing theoverall size of the die, the yield for chiplets tends to be higher thanthat of more complex, single die devices.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings, which are not necessarily drawn to scale, like numeralsmay describe similar components in different views. Like numerals havingdifferent letter suffixes may represent different instances of similarcomponents. The drawings illustrate generally, by way of example, butnot by way of limitation, various embodiments discussed in the presentdocument.

FIGS. 1A-1B illustrate an example of a chiplet system, in accordancewith some examples described herein.

FIG. 2 is a block diagram of an example of a memory controller chiplet,in accordance with some examples described herein.

FIG. 3 is a flow diagram of a method of operating a memory controllerchiplet, in accordance with some examples described herein.

FIG. 4 illustrates a block diagram of an example machine, in accordancewith some examples described herein.

DETAILED DESCRIPTION

Embodiments of the present disclosure are directed to initializingelectronic systems that include chiplets. A chiplet-based system mayinclude chiplets that each perform a different function, or the systemmay include multiple chiplets that perform the same function, but withthe multiple chiplets configured together (e.g., to implementparallelism in performing the function) to provide a higher performancesolution. An example chiplet-based system may include a memorycontroller chiplet to control accesses to a storage array. To providehigh-performance in terms of memory accesses through the use of thememory controller chiplet, the memory controller chiplet can beconfigured in accordance with the current description to facilitateerror correction and cache management in a manner to minimize the needto interrupt a sequence of data reads to write corrected data from aprior read back into the storage array.

FIGS. 1A and 1B illustrate an example of a chiplet system 110. FIG. 1Ais a representation of the chiplet system 110 mounted on a peripheralboard 105, that can be connected to a broader computer system by aperipheral component interconnect express (PCIe), for example. Thechiplet system 110 includes a package substrate 115, an interposer 120,and four chiplets; an application chiplet 125, a host interface chiplet135, a memory controller chiplet 140, and a memory device chiplet 150.The package of the chiplet system 110 is illustrated with a lid 165,though other covering techniques for the package can be used. FIG. 1B isa block diagram labeling the components in the chiplet system forclarity.

The application chiplet 125 is illustrated as including anetwork-on-chip (NOC) 130 to support an inter-chiplet communicationsnetwork, or chiplet network 155. The NOC 130 is generally included onthe application chiplet 125 because it is usually created after thesupport chiplets (e.g., chiplets 135, 140, and 150) are selected, thusenabling a designer to select an appropriate number of chiplet networkconnections or switches for the NOC 130. In an example, the NOC 130 canbe located on a separate chiplet, or even within the interposer 120. Inan example, the NOC 130 implements a chiplet protocol interface (CPI)network.

A CPI network is a packet-based network that supports virtual channelsto enable a flexible and high-speed interaction between chiplets. CPIenables bridging from intra-chiplet networks to the chiplet network 155.For example, the Advanced eXtensible Interface (AXI) is a widely usedspecification to design intra-chip communications. AXI specifications,however, cover a great variety of physical design options, such as thenumber of physical channels, signal timing, power, etc. Within a singlechip, these options are generally selected to meet design goals, such aspower consumption, speed, etc. To achieve the flexibility of the chipletsystem, CPI is used as an adapter to interface between the various AXIdesign options (or non-AXI communication protocols) that can be usedacross the various chiplets. By enabling a physical channel to virtualchannel mapping and encapsulating time-based signaling with a packetizedprotocol, CPI successfully bridges intra-chiplet networks across thechiplet network 155.

CPI can use a variety of different physical layers to transmit packets.The physical layer can include simple conductive connections or includedrivers to transmit the signals over longer distances or drive greaterloads. An example of one such physical layer can include the AdvancedInterface Bus (AIB), implemented in the interposer 120. AIB transmitsand receives data using source synchronous data transfers with aforwarded clock. Packets are transferred across the AIB at single datarate (SDR) or dual data rate (DDR) with respect to the transmittedclock. Various channel widths are supported by AIB. AIB channel widthsare in multiples of 20 bits when operated in SDR mode (20, 40, 60, . . .), and 40 bits for DDR mode: (40, 80, 120, . . . ). The AIB channelwidth includes both transmit and receive signals. The channel can beconfigured to have a symmetrical number of transmit (TX) and receive(RX) input/outputs (I/Os), or have a non-symmetrical number oftransmitters and receivers (e.g., either all transmitters or allreceivers). The AIB channel can act as an AIB master or slave dependingon which chiplet provides the master clock. AIB I/O cells support threeclocking modes: asynchronous (i.e. non-clocked), SDR, and DDR. Thenon-clocked mode is used for clocks and some control signals. The SDRmode can use dedicated SDR only I/O cells, or dual use SDR/DDR I/Ocells.

In an example, CPI packet protocols (e.g., point-to-point or routable)can use symmetrical receive and transmit I/O cells within an AIBchannel. The CPI streaming protocol allows more flexible use of the AIBI/O cells. In an example, an AIB channel for streaming mode canconfigure the I/O cells as all TX, all RX, or half RX and half RX. CPIpacket protocols can use an AIB channel in either SDR or DDR operationmodes. In an example, the AIB channel is configurable in increments of80 I/O cells (i.e. 40 TX and 40 RX) for SDR mode and 40 I/O cells forDDR mode. The CPI streaming protocol can use an AIB channel in eitherSDR or DDR operation modes. Here, in an example, the AIB channel is inincrements of 40 I/O cells for both SDR and DDR modes. In an example,each AIB channel is assigned a unique interface identifier. Theinterface identifier is used during CPI reset and initialization todetermine paired AIB channels across adjacent chiplets. In an example,the interface identifier is a 20-bit value comprising a seven-bitchiplet identifier, a seven-bit column identifier, and a six-bit linkidentifier. The AIB physical layer transmits the interface identifierusing an AIB out-of-band shift register. The 20-bit interface identifieris transferred in both directions across an AIB interface using bits32-51 of the shift registers.

AIB defines a stacked set of AIB channels as an AIB channel column. AnAIB channel column has some number of AIB channels, plus an auxiliary(AUX) channel that can be used for out-of-band signaling. The auxiliarychannel contains signals used for AIB initialization. All AIB channels(other than the auxiliary channel) within a column are of the sameconfiguration (e.g., all TX, all RX, or half TX and half RX, as well ashaving the same number of data I/O signals). In an example, AIB channelsare numbered in continuous increasing order starting with the AIBchannel adjacent to the AUX channel. The AIB channel adjacent to the AUXis defined to be AIB channel zero.

In general, CPI interfaces of individual chiplets can includeserialization-deserialization (SERDES) hardware. SERDES interconnectswork well for scenarios in which high-speed signaling with low signalcount are desirable. However, SERDES can result in additional powerconsumption and longer latencies for multiplexing and demultiplexing,error detection or correction (e.g., using block level cyclic redundancychecking (CRC)), link-level retry, or forward error correction. Forultra-short reach chiplet-to-chiplet interconnects where low latency orenergy consumption is a primary concern, a parallel interface with clockrates that allow data transfer with minimal latency can be a bettersolution. CPI includes elements to minimize both latency and energyconsumption in these ultra-short reach chiplet interconnects.

For flow control, CPI employs a credit-based technique. A CPI recipient,such as the application chiplet 125, provides a CPI sender, such as thememory controller chiplet 140, with credits that represent availablebuffers. In an example, a CPI recipient includes a buffer for eachvirtual channel for a given time-unit of transmission. Thus, if the CPIrecipient supports five messages in time and a single virtual channel,the recipient has five buffers arranged in five rows (e.g., one row foreach unit time). If four virtual channels are supported, then the CPIrecipient has twenty buffers arranged in five rows. Each buffer is sizedto hold the payload of one CPI packet.

When the CPI sender transmits to the CPI recipient, the senderdecrements the available credits based on the transmission. Once allcredits for the recipient are consumed, the sender stops sending packetsto the recipient. This ensures that the recipient always has anavailable buffer to store the transmission.

As the recipient processes received packets and frees buffers, therecipient communicates the available buffer space back to the sender.This credit return can then be used by the sender to transmit additionalinformation.

Also illustrated in FIGS. 1A and 1B is a chiplet mesh network 160 thatuses a direct, chiplet-to-chiplet technique without the need for the NOC130. The chiplet mesh network 160 can be implemented in CPI, or anotherchiplet-to-chiplet protocol. The chiplet mesh network 160 generallyenables a pipeline of chiplets where one chiplet serves as the interfaceto the pipeline while other chiplets in the pipeline interface only withthemselves.

Additionally, dedicated device interfaces, such as the memory interface145, can also be used to interconnect chiplets, or to connect chipletsto external devices; such as the host interface chiplet 135 providing aPCIE interface external to the board 105 for the application chiplet125. Such dedicated interfaces 145 are generally used when a conventionor standard in the industry has converged on such an interface. Theillustrated example of a Double Data Rate (DDR) interface 145 connectingthe memory controller chiplet 140 to a dynamic random access memory(DRAM) memory device chiplet 150 is an example of such an industryconvention.

Of the variety of possible support chiplets, the memory controllerchiplet 140 is likely present in the chiplet system 110 due to the nearomnipresent use of storage for computer processing as well as asophisticated state-of-the-art for memories. Thus, using memory devicechiplets 150 and memory controller chiplets 140 produced by others giveschiplet system designers access to robust products by sophisticatedproducers. Generally, the memory controller chiplet 140 provides amemory device specific interface to read, write, or erase data. Often,the memory controller chiplet 140 can provide additional features, suchas error detection, error correction, maintenance operations, or atomicoperation execution. Maintenance operations tend to be specific to thememory device chiplet 150, such as garbage collection in NAND flash orstorage class memories, temperature adjustments (e.g., cross temperaturemanagement) in NAND flash memories. In an example, the maintenanceoperations can include logical-to-physical (L2P) mapping or managementto provide a level of indirection between the physical and logicalrepresentation of data.

Atomic operations are a data manipulation performed by the memorycontroller chiplet 140. For example, an atomic operation of “increment”can be specified in a command by the application chiplet 125, thecommand including a memory address and possibly an increment value. Uponreceiving the command, the memory controller chiplet 140 retrieves anumber from the specified memory address, increments the number by theamount specified in the command, and stores the result. Upon asuccessful completion, the memory controller chiplet 140 provides anindication of the command's success to the application chiplet 125.Atomic operations avoid transmitting the data across the chiplet network160, resulting in lower latency execution of such commands.

Atomic operations can be classified as built-in atomics or programmable(e.g., custom) atomics. Built-in atomics are a finite set of operationsthat are immutably implemented in hardware. Programmable atomics aresmall programs that can run on a programmable atomic unit (PAU) (e.g., acustom atomic unit (CAU)) of the memory controller chiplet 140. Anexample of a memory controller chiplet 140 implementing a PAU isdescribed in regard to FIG. 2.

The memory device chiplet 150 can be, or can include, any combination ofvolatile memory devices or non-volatile memories. Examples of volatilememory devices include, but are not limited to, random access memory(RAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), and graphics doubledata rate type 6 SDRAM (GDDR6 SDRAM). Examples of non-volatile memorydevices include, but are not limited to, negative-and-(NAND)-type flashmemory, storage class memory (e.g., phase-change memory or memristorbased technologies), and ferroelectric RAM (FeRAM). The illustratedexample of FIGS. 1A and 1B includes the memory device 150 as a chiplet,however, the memory device 150 can reside elsewhere, such as in adifferent package on the board 105.

FIG. 2 illustrates components of an example of a memory controllerchiplet 205, according to an embodiment. The memory controller chiplet205 includes a cache memory or cache 210, a cache controller 215, anoff-die memory controller 220 (e.g., to communicate with off-die memory275), a network communication interface 225 (e.g., to interface with achiplet network 280 and communicate with other chiplets), and a set ofatomic and merge operations 250. Members of this set can include a writemerge unit 255, a hazard unit (260), built-in atomic unit 265, or a PAU270. The various components are illustrated logically, and not as theynecessarily would be implemented. For example, the built-in atomic unit265 likely comprises different devices along a path to the off-diememory. In contrast, the programmable atomic operations 270 are likelyimplemented in a separate processor on the memory controller chiplet205.

The off-die memory controller 220 is directly coupled to the off-diememory 275 (e.g., via a bus or other communication connection) toprovide write operations and read operations to and from the off-diememory 275. The off-die memory includes a memory array containing memorycells. The off-die memory may be one or more memory die or a memorydevice chiplet. The off-die memory controller 220 is also coupled foroutput to the atomic and merge operations unit 250, and for input to thecache controller 215.

The cache controller 215 is directly coupled to the cache 210, and alsocoupled to the network communication interface 225 for input (such asincoming read or write requests) and to the off-die memory controller220 for output.

The network communication interface 225 includes a packet decoder 230,network input queues 235, a packet encoder 240, and network outputqueues 245 to support a packet-based chiplet network 280 (e.g., a CPInetwork). The chiplet network 280 can provide packet routing between andamong processors, memory controllers, hybrid threading processors,configurable processing circuits, or communication interfaces. In such apacket-based communication system, each packet typically includesdestination and source addressing, along with any data payload orinstruction. In an example, the chiplet network 280 can be implementedas a collection of crossbar switches having a folded clos configuration,or a mesh network providing for additional connections, depending uponthe configuration. The chiplet network 280 can be part of anasynchronous switching fabric. In this example, a data packet can berouted along any of various paths, such that the arrival of any selecteddata packet at an addressed destination can occur at any of a pluralityof different times, depending upon the routing. The chiplet network 280can be implemented as a synchronous communication network, such as asynchronous mesh communication network. Any and all such communicationnetworks are considered equivalent and within the scope of thedisclosure.

The memory controller chiplet 205 can receive a packet having a sourceaddress, a read request, and a physical address. In response, theoff-die memory controller 220 or the cache controller 215 will read thedata from the specified physical address (which can be in the off-diememory 275 or in the cache 210) and assemble a response packet to thesource address containing the requested data. Similarly, the memorycontroller chiplet 205 can receive a packet having a source address, awrite request, and a physical address. In response, the memorycontroller chiplet 205 will write the data to the specified physicaladdress (which can be in the off-die memory 275 or in the cache 210) andassemble a response packet to the source address containing anacknowledgment that the data was stored to a memory.

Thus, the memory controller chiplet 205 can receive read and writerequests via the chiplet network 280 and process the requests using thecache controller 215 interfacing with the cache 210. If the requestcannot be handled by the cache controller 215, the off-die memorycontroller 220 handles the request by communication with the off-diememory 275, by the atomic and merge operations 250, or by both. Dataread by the off-die memory controller 220 can be stored in the cache 210by the cache controller 215 for later use.

The atomic and merge operations 250 are coupled to receive (as input)the output of the off-die memory controller 220, and to provide outputto the cache 210, the network communication interface 225, or directlyto the chiplet network 280. The memory hazard clear (reset) unit 260,write merge unit 265 and the built-in (e.g., predetermined) atomicoperations unit 265 can each be implemented as state machines with othercombinational logic circuitry (such as adders, shifters, comparators,AND gates, OR gates, XOR gates, or any suitable combination thereof) orother logic circuitry. These components can also include one or moreregisters or buffers to store operand or other data. The PAU 270 can beimplemented as one or more processor cores or control circuitry, andvarious state machines with other combinational logic circuitry or otherlogic circuitry, and can also include one or more registers, buffers, ormemories to store addresses, executable instructions, operand and otherdata, or can be implemented as a processor.

The write merge unit 255 receives read data and request data, and mergesthe request data and read data to create a single unit having the readdata and the source address to be used in the response or return datapacket). The write merge unit 255 provides the merged data to the writeport of the cache 210 (or, equivalently, to the cache controller 215 towrite to the cache 210). Optionally, the write merge unit 255 providesthe merged data to the network communication interface 225 to encode andprepare a response or return data packet for transmission on the chipletnetwork 280.

When the request data is for a built-in atomic operation, the built-inatomic operations unit 265 receives the request and read data, eitherfrom the write merge unit 265 or directly from the off-die memorycontroller 220. The atomic operation is performed, and using the writemerge unit 255, the resulting data is written to the cache 210, orprovided to the network communication interface 225 to encode andprepare a response or return data packet for transmission on the chipletnetwork 280.

The built-in atomic operations unit 265 handles predefined atomicoperations such as fetch-and-increment or compare-and-swap. In anexample, these operations perform a simple read-modify-write operationto a single memory location of 32-bytes or less in size. Atomic memoryoperations are initiated from a request packet transmitted over thechiplet network 280. The request packet has a physical address, atomicoperator type, operand size, and optionally up to 32-bytes of data. Theatomic operation performs the read-modify-write to a cache memory lineof the cache 210, filling the cache memory if necessary. The atomicoperator response can be a simple completion response, or a responsewith up to 32-bytes of data. Example atomic memory operators includefetch-and-AND, fetch-and-OR, fetch-and-XOR, fetch-and-add,fetch-and-subtract, fetch-and-increment, fetch-and-decrement,fetch-and-minimum, fetch-and-maximum, fetch-and-swap, andcompare-and-swap. In various example embodiments, 32-bit and 64-bitoperations are supported, along with operations on 16 or 32 bytes ofdata. Methods disclosed herein are also compatible with hardwaresupporting larger or smaller operations and more or less data.

Built-in atomic operations can also involve requests for a “standard”atomic operation on the requested data, such as a comparatively simple,single cycle, integer atomics (e.g., fetch-and-increment orcompare-and-swap), which will occur with the same throughput as aregular memory read or write operation not involving an atomicoperation. For these operations, the cache controller 215 generallyreserves a cache line in the cache 210 by setting a hazard bit (inhardware), so that the cache line cannot be read by another processwhile it is in transition. The data is obtained from either the off-diememory 275 or the cache 210, and is provided to the built-in atomicoperation unit 265 to perform the requested atomic operation. Followingthe atomic operation, in addition to providing the resulting data to thedata packet encoder 240 to encode outgoing data packets for transmissionon the chiplet network 280, the built-in atomic operation unit 265provides the resulting data to the write merge unit 255, which will alsowrite the resulting data to the cache circuit 210. Following the writingof the resulting data to the cache 210, any corresponding hazard bitwhich was set will be cleared by the memory hazard clear unit 260.

The PAU 270 enables high performance (high throughput and low latency)for programmable atomic operations (also referred to as “custom atomicoperations”), comparable to the performance of built-in atomicoperations. Rather than executing multiple memory accesses, in responseto an atomic operation request designating a programmable atomicoperation and a memory address, circuitry in the memory controllerchiplet 205 transfers the atomic operation request to PAU 270 and sets ahazard bit stored in a memory hazard register corresponding to thememory address of the memory line used in the atomic operation, toensure that no other operation (read, write, or atomic) is performed onthat memory line. The hazard bit is then cleared upon completion of theatomic operation. Additional direct data paths provided for the PAU 270to execute the programmable atomic operations allow for additional writeoperations without any limitations imposed by the bandwidth of thecommunication networks and without increasing any congestion of thecommunication networks.

The PAU 270 include a RISC-V instruction set architecture (RISC-V ISA)based multi-threaded processor having one or more processor cores, andmay further have an extended instruction set for executing programmableatomic operations. When provided with the extended instruction set forexecuting programmable atomic operations, the PAU 270 can be embodied asone or more hybrid threading processors. In some example embodiments,the PAU 270 provides barrel-style, round-robin instantaneous threadswitching to maintain a high instruction-per-clock rate.

Programmable atomic operations can be performed by the PAU 270 thatinvolve requests for a programmable atomic operation on the requesteddata. A user can prepare programming code to provide such programmableatomic operations. For example, the programmable atomic operations canbe comparatively simple, multi-cycle operations such as floating-pointaddition, or comparatively complex, multi-instruction operations such asa Bloom filter insert. The programmable atomic operations can be thesame as or different than the predetermined atomic operations, insofaras they are defined by the user rather than a system vendor. For theseoperations, the cache controller 215 can reserve a cache line in thecache 210, by setting a hazard bit (in hardware), so that cache linecannot be read by another process while it is in transition. The data isobtained from either the off-die memory 275 or the cache 210, and isprovided to the PAU 270 to perform the requested programmable atomicoperation. Following the atomic operation, the PAU 270 will provide theresulting data to the network communication interface 225 to directlyencode outgoing data packets having the resulting data for transmissionon the chiplet network 280. In addition, the PAU 270 will provide theresulting data to the cache controller 215, which will also write theresulting data to the cache 210. Following the writing of the resultingdata to the cache 210, any corresponding hazard bit which was set willbe cleared by the cache controller 215.

The approach taken for programmable atomic operations is to providemultiple, generic, custom atomic request types that can be sent throughthe chiplet network 280 to the memory controller chiplet 205 from anoriginating source such as a processor or other system component. Thecache controller 215 and off-die memory controller 220 identify therequest as a custom atomic request and forward the request to the PAU270. In a representative embodiment, the PAU 270: (1) is a programmableprocessing element capable of efficiently performing a user definedatomic operation; (2) can perform load and stores to memory, arithmeticand logical operations and control flow decisions; and (3) can leveragethe RISC-V ISA with a set of new, specialized instructions to facilitateinteracting with the cache and off-die controllers 215, 220 toatomically perform the user-defined operation. It should be noted thatthe RISC-V ISA contains a full set of instructions that support highlevel language operators and data types. The PAU 270 can leverage theRISC-V ISA, but generally supports a more limited set of instructionsand limited register file size to reduce the die size of the unit whenincluded within the memory controller chiplet 205.

As mentioned above, any hazard bit which is set will be cleared by thememory hazard clear unit 260. Prior to the writing of the read data tothe cache 210, a set hazard bit for the reserved cache line is to becleared by the memory hazard clear unit 260. Accordingly, when therequest and read data is received by the write merge unit 255, a resetor clear signal can be transmitted by the memory hazard clear unit 260to the cache 210 to reset the set memory hazard bit for the reservedcache line. Also, resetting this hazard bit will also release a pendingread or write request involving the designated (or reserved) cache line,providing the pending read or write request to an inbound requestmultiplexer for selection and processing.

FIG. 3 is a flow diagram of an example of a method of operating a memorysystem, such as a memory system including the memory controller chiplet205 and the off-die memory 275 of FIG. 2. The off-die memory 275 mayinclude memory die (e.g., NAND memory die) or a memory device chiplet.At 305, a memory read request is received by the memory controllerchiplet 205 from a second device. The second device may be anotherchiplet such as the application chiplet or host interface chiplet of theexample of FIG. 1. The memory read request includes a memory address,and the memory read request is decoded by the memory controller. Inresponse to the memory read request, the memory controller obtains therequested read data from the memory controller cache 210 or from amemory array included in the off-die memory 275.

When the memory controller receives and decodes the memory read request,it queries the memory controller cache 210 to check whether therequested data is stored in the memory controller cache 210. If therequested data is not stored in the memory controller cache 210, thecache controller 215 allocates a cache line for the read data andindicates that the state of the cache line is invalid.

At 310 in FIG. 3, because the read data is not in the memory controllercache 210, read data for the memory read request is fetched from thememory array in the off-die memory 275 according to the memory address.The off-die memory controller 220 sends the read request to the mediascheduler 232. The media scheduler 232 sends a request for the read datato the memory array via a memory array bus included in the interface tothe off-die memory 275.

The memory read operation provides error correction. The memorycontroller may use error correction code (ECC) to detect and correcterrors. The error correction code may be stored with the data in thememory array or stored separate from the data. The error correction codemay have been generated by the memory controller using ECC circuitrywhen the write data was received from the second device or may have beenprovided with data written into the memory array during a writeoperation.

If the error correction code is stored with the data, only one readrequest is sent. If the error correction code is not stored with thedata, two separate requests are sent, one request for the data stored ata first memory address, and a second request for the error correctioncode stored at a different memory address. The media scheduler 232 findsa time slot for the read request on the memory array bus and the requestis sent to the memory array.

The off-die memory may include one or more NAND die, and the memoryarray may be a NAND flash memory array. The memory array bus may be aNAND flash bus using a double data rate (DDR) interface. The memorycontroller chiplet 205 may include physical layer circuitry that sendscommands to the memory array, and sends write data and receives the readdata using the memory array bus.

The read data and the error correction code are received from the memoryarray and provided to the media scheduler 232. The media scheduler 232returns the data and error correction code to the memory controllercache 210. The memory controller cache 210 uses the data and the errorcorrection code to check for a correctable error in the read data. Thememory controller cache 210 may include the error correction circuitryto correct the errors or the error correction circuitry may be separatefrom the memory controller cache 210. If no error is found in the readdata, the data is stored in the memory controller cache 210 andallocated the cache line generated previously for the data. The state ofthe cache line is changed to valid.

Returning to FIG. 3, at 315 the memory controller detects that the readdata includes a correctable data error. At 320, the memory controllerreturns corrected read data to the requesting second device in responseto the memory request.

At 325, the memory controller stores the corrected read data in thememory controller cache. State information of the data stored in thecache is also stored. The state of the corrected data in the memorycontroller cache is designated by the cache controller 215 as valid anddirty to indicate that the data in cache is different from the data inthe memory array. It should be noted that the dirty state or dirty flagis set as part of a read operation. Normally, a dirty flag is only usedin write operations to signal to other processors that write data wasreceived from a processor and the data in cache is modified from thecopy in main memory. The dirty flag is not used during read operations.In the present example, only off-die memory controller 220 is accessingthe off-die memory devices, thus the system is fully coherent, and thedirty flag may be used for an additional purpose—to facilitate deferralof correcting data in the memory array (off-die memory 275) with thecorrected data previous written in memory controller cache 210.

At 330, sometime in the future the corrected read data is written to thememory array in response to a cache eviction event to correct errors inthe memory array. The memory controller will write the data in the cacheindicated by the dirty flag and the error correction codes derived fromthe data into the memory array. The data may be written in response to acache event, such as a capacity eviction, flush operation, evictionmaintenance request or other cache operation that results in aneviction.

In some examples, the data may be written back to the memory as part ofa demand scrubbing operation. To implement the demand scrubbingoperation, the data can be written using the paths used in normal writeoperations without the need for a read-modify-write path, and the writeoperations for the demand scrubbing can be scheduled with other writeoperations. By using the memory controller cache, the path from thememory controller to the memory array is simpler. Also, the mediascheduler does not have to implement the data correction.

FIG. 4 illustrates a block diagram of an example machine 400 upon whichany one or more of the techniques (e.g., methodologies) discussed hereinmay perform, such as the described discard and purge operations forexample. In alternative embodiments, the machine 400 may operate as astandalone device or may be connected (e.g., networked) to othermachines. In a networked deployment, the machine 400 may operate in thecapacity of a server machine, a client machine, or both in server-clientnetwork environments. In an example, the machine 400 may act as a peermachine in peer-to-peer (P2P) (or other distributed) networkenvironment. The machine 400 may be a personal computer (PC), a tabletPC, a set-top box (STB), a personal digital assistant (PDA), a mobiletelephone, a web appliance, an IoT device, automotive system, or anymachine capable of executing instructions (sequential or otherwise) thatspecify actions to be taken by that machine. Further, while only asingle machine is illustrated, the term “machine” shall also be taken toinclude any collection of machines that individually or jointly executea set (or multiple sets) of instructions to perform any one or more ofthe methodologies discussed herein, such as cloud computing, software asa service (SaaS), other computer cluster configurations.

The embodiments and examples, as described herein, may include, or mayoperate by, logic, components, devices, packages, or mechanisms.Circuitry is a collection (e.g., set) of circuits implemented intangible entities that include hardware (e.g., simple circuits, gates,logic, etc.). Circuitry membership may be flexible over time andunderlying hardware variability. Circuitries include members that may,alone or in combination, perform specific tasks when operating. In anexample, hardware of the circuitry may be immutably designed to carryout a specific operation (e.g., hardwired). In an example, the hardwareof the circuitry may include variably connected physical components(e.g., execution units, transistors, simple circuits, etc.) including acomputer-readable medium physically modified (e.g., magnetically,electrically, movable placement of invariant massed particles, etc.) toencode instructions of the specific operation. In connecting thephysical components, the underlying electrical properties of a hardwareconstituent are changed, for example, from an insulator to a conductoror vice versa. The instructions enable participating hardware (e.g., theexecution units or a loading mechanism) to create members of thecircuitry in hardware via the variable connections to carry out portionsof the specific tasks when in operation. Accordingly, thecomputer-readable medium is communicatively coupled to the othercomponents of the circuitry when the device is operating. In an example,any of the physical components may be used in more than one member ofmore than one circuitry. For example, under operation, execution unitsmay be used in a first circuit of a first circuitry at one point in timeand reused by a second circuit in the first circuitry, or by a thirdcircuit in a second circuitry at a different time.

The machine (e.g., computer system) 400 (e.g., the chiplet-based systemof FIG. 1, etc.) may include a processing device 402 (e.g., a hardwareprocessor, a central processing unit (CPU), a graphics processing unit(GPU), a hardware processor core, or any combination thereof, such as amemory control unit of the memory device 110, etc.), a main memory 404(e.g., read-only memory (ROM), flash memory, dynamic random-accessmemory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM),etc.), a static memory 406 (e.g., flash memory, static random-accessmemory (SRAM), etc.), and a data storage system 418, some or all ofwhich may communicate with each other via an interlink (e.g., bus) 430.

The processing device 402 can represent one or more general-purposeprocessing devices such as a microprocessor, a central processing unit,or the like. More particularly, the processing device 402 can be acomplex instruction set computing (CISC) microprocessor, reducedinstruction set computing (RISC) microprocessor, very long instructionword (VLIW) microprocessor, or a processor implementing otherinstruction sets, or processors implementing a combination ofinstruction sets. The processing device 402 can also be one or morespecial-purpose processing devices such as an application specificintegrated circuit (ASIC), a field programmable gate array (FPGA), adigital signal processor (DSP), network processor, or the like. Theprocessing device 402 can be configured to execute instructions 426 forperforming the operations and steps discussed herein. The machine 400can further include a network interface device 408 to communicate over anetwork 420.

The data storage system 418 can include a machine-readable storagemedium 424 (also known as a computer-readable medium) on which is storedone or more sets of instructions 426 or software embodying any one ormore of the methodologies or functions described herein. Theinstructions 426 can also reside, completely or at least partially,within the main memory 404 or within the processing device 402 duringexecution thereof by the machine 400, the main memory 404 and theprocessing device 402 also constituting machine-readable storage media.The machine-readable storage medium 424, the data storage system 418, orthe main memory 504 can correspond to the off-die memory 150 of FIG. 1.In one implementation, the instructions 426 include instructions 411 toimplement functionality corresponding to writing error corrected datafrom a memory controller cache to the storage media as part of a demandscrubbing operation.

While the machine-readable storage medium 424 is shown in an exampleimplementation to be a single medium, the term “machine-readable storagemedium” should be taken to include a single medium or multiple mediathat store the one or more sets of instructions. The term“machine-readable storage medium” shall also be taken to include anymedium that is capable of storing or encoding a set of instructions forexecution by the machine and that cause the machine to perform any oneor more of the methodologies of the present disclosure. The term“machine-readable storage medium” shall accordingly be taken to include,but not be limited to, solid-state memories, optical media, and magneticmedia. In an example, a massed machine-readable medium comprises amachine-readable medium with a plurality of particles having invariant(e.g., rest) mass. Accordingly, massed machine-readable media are nottransitory propagating signals. Specific examples of massedmachine-readable media may include: non-volatile memory, such assemiconductor memory devices (e.g., Electrically Programmable Read-OnlyMemory (EPROM), Electrically Erasable Programmable Read-Only Memory(EEPROM)) and flash memory devices; magnetic disks, such as internalhard disks and removable disks; magneto-optical disks; and CD-ROM andDVD-ROM disks.

The machine 400 may further include a display unit, an alphanumericinput device (e.g., a keyboard), and a user interface (UI) navigationdevice (e.g., a mouse). In an example, one or more of the display unit,the input device, or the UI navigation device may be a touch screendisplay. The machine may include a signal generation device (e.g., aspeaker), or one or more sensors, such as a global positioning system(GPS) sensor, compass, accelerometer, or one or more other sensor. Themachine 400 may include an output controller, such as a serial (e.g.,universal serial bus (USB), parallel, or other wired or wireless (e.g.,infrared (IR), near field communication (NFC), etc.) connection tocommunicate or control one or more peripheral devices (e.g., a printer,card reader, etc.).

The instructions 426 (e.g., software, programs, an operating system(OS), etc.) or other data are stored on the data storage device 418 canbe accessed by the main memory 404 for use by the processing device 402.The main memory 404 (e.g., DRAM) is typically fast, but volatile, andthus a different type of storage than the data storage device 418 (e.g.,an SSD), which is suitable for long-term storage, including while in an“off” condition. The instructions 426 or data in use by a user or themachine 400 are typically loaded in the main memory 404 for use by theprocessing device 402. When the main memory 404 is full, virtual spacefrom the data storage device 418 can be allocated to supplement the mainmemory 404; however, because the data storage device 418 device istypically slower than the main memory 404, and write speeds aretypically at least twice as slow as read speeds, use of virtual memorycan greatly reduce user experience due to storage device latency (incontrast to the main memory 404, e.g., DRAM). Further, use of the datastorage device 418 for virtual memory can greatly reduce the usablelifespan of the data storage device 418.

In contrast to virtual memory, virtual memory compression (e.g., theLinux™ kernel feature “ZRAM”) uses part of the memory as compressedblock storage to avoid paging to the data storage device 418. Pagingtakes place in the compressed block until it is necessary to write suchdata to the data storage device 418. Virtual memory compressionincreases the usable size of the main memory 404, while reducing wear onthe data storage device 418.

Storage devices optimized for mobile electronic devices, or mobilestorage, traditionally include MMC solid-state storage devices (e.g.,micro Secure Digital (microSD™) cards, etc.). MMC devices include anumber of parallel interfaces (e.g., an 8-bit parallel interface) with ahost (e.g., a host device), and are often removable and separatecomponents from the host. In contrast, eMMC™ devices are attached to acircuit board and considered a component of the host, with read speedsthat rival serial ATA™ (Serial AT (Advanced Technology) Attachment, orSATA) based SSD devices. However, demand for mobile device performancecontinues to increase, such as to fully enable virtual oraugmented-reality devices, utilize increasing networks speeds, etc. Inresponse to this demand, storage devices have shifted from parallel toserial communication interfaces. Universal Flash Storage (UFS) devices,including controllers and firmware, communicate with a host using alow-voltage differential signaling (LVDS) serial interface withdedicated read/write paths, further advancing greater read/write speeds.

The instructions 424 may further be transmitted or received over anetwork 420 using a transmission medium via the network interface device408 utilizing any one of a number of transfer protocols (e.g., framerelay, internet protocol (IP), transmission control protocol (TCP), userdatagram protocol (UDP), hypertext transfer protocol (HTTP), etc.).Example communication networks may include a local area network (LAN), awide area network (WAN), a packet data network (e.g., the Internet),mobile telephone networks (e.g., cellular networks), Plain Old Telephone(POTS) networks, and wireless data networks (e.g., Institute ofElectrical and Electronics Engineers (IEEE) 802.11 family of standardsknown as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE802.15.4 family of standards, peer-to-peer (P2P) networks, among others.In an example, the network interface device 408 may include one or morephysical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or moreantennas to connect to the network 420. In an example, the networkinterface device 408 may include a plurality of antennas to wirelesslycommunicate using at least one of single-input multiple-output (SIMO),multiple-input multiple-output (MIMO), or multiple-input single-output(MISO) techniques. The term “transmission medium” shall be taken toinclude any intangible medium that is capable of storing, encoding, orcarrying instructions for execution by the machine 400, and includesdigital or analog communications signals or other intangible medium tofacilitate communication of such software. A transmission medium is amachine readable medium.

The above detailed description includes references to the accompanyingdrawings, which form a part of the detailed description. The drawingsshow, by way of illustration, specific embodiments in which theinvention can be practiced. These embodiments are also referred toherein as “examples”. Such examples can include elements in addition tothose shown or described. However, the present inventors alsocontemplate examples in which only those elements shown or described areprovided. Moreover, the present inventors also contemplate examplesusing any combination or permutation of those elements shown ordescribed (or one or more aspects thereof), either with respect to aparticular example (or one or more aspects thereof), or with respect toother examples (or one or more aspects thereof) shown or describedherein.

In this document, the terms “a” or “an” are used, as is common in patentdocuments, to include one or more than one, independent of any otherinstances or usages of “at least one” or “one or more.” In thisdocument, unless stated otherwise the term “or” is used to refer to anonexclusive or, such that “A or B” may include “A but not B,” “B butnot A,” and “A and B,” unless otherwise indicated. In the appendedclaims, the terms “including” and “in which” are used as theplain-English equivalents of the respective terms “comprising” and“wherein”. Also, in the following claims, the terms “including” and“comprising” are open-ended. A system, device, article, or process thatincludes elements in addition to those listed after such a term in aclaim are still deemed to fall within the scope of that claim. Moreover,in the following claims, the terms “first,” “second,” and “third,” etc.are used merely as labels, and are not intended to impose numericalrequirements on their objects.

In various examples, the components, controllers, processors, units,engines, or tables described herein can include, among other things,physical circuitry or firmware stored on a physical device. As usedherein, “processor” means any type of computational circuit such as, butnot limited to, a microprocessor, a microcontroller, a graphicsprocessor, a digital signal processor (DSP), or any other type ofprocessor or processing circuit, including a group of processors ormulti-core devices.

Operating a memory cell, as used herein, includes reading from, writingto, or erasing the memory cell. The operation of placing a memory cellin an intended state is referred to herein as “programming,” and caninclude both writing to or erasing from the memory cell (e.g., thememory cell may be programmed to an erased state).

According to one or more embodiments of the present disclosure, a memorycontroller (e.g., a processor, controller, firmware, etc.) locatedinternal or external to a memory device, is capable of determining(e.g., selecting, setting, adjusting, computing, changing, clearing,communicating, adapting, deriving, defining, utilizing, modifying,applying, etc.) a quantity of wear cycles, or a wear state (e.g.,recording wear cycles, counting operations of the memory device as theyoccur, tracking the operations of the memory device it initiates,evaluating the memory device characteristics corresponding to a wearstate, etc.)

According to one or more embodiments of the present disclosure, a memoryaccess device may be configured to selectively reduce the operating rateof one or more components to reduce active power. The memory devicecontrol circuitry (e.g., control logic) may be programmed to slow theclock signal provided to the components in response to determining thetype of memory accesses (e.g., memory usage patterns) that are beingperformed by the memory access device.

Method examples described herein can be machine, device, orcomputer-implemented at least in part. Some examples can include acomputer-readable medium, a device-readable medium, or amachine-readable medium encoded with instructions operable to configurean electronic device to perform methods as described in the aboveexamples. An implementation of such methods can include code, such asmicrocode, assembly language code, a higher-level language code, or thelike. Such code can include computer readable instructions forperforming various methods. The code may form portions of computerprogram products. Further, the code can be tangibly stored on one ormore volatile or non-volatile tangible computer-readable media, such asduring execution or at other times. Examples of these tangiblecomputer-readable media can include, but are not limited to, hard disks,removable magnetic disks, removable optical disks (e.g., compact discsand digital video disks), magnetic cassettes, memory cards or sticks,random access memories (RAMs), read only memories (ROMs), solid statedrives (SSDs), Universal Flash Storage (UFS) device, embedded MMC (eMMC)device, and the like.

Example 1 includes subject matter (such as a memory system) comprising amemory array including memory cells and a memory controller operativelycoupled to the memory array and including a memory controller cache. Thememory controller is configured to decode a memory read request from thehost device, obtain read data for the memory read request from a memoryarray of the memory device according to a memory address, detect thatthe read data has a correctable error, return corrected read data to thehost device in response to the memory request, store the corrected readdata in the memory controller cache and indicate a state of thecorrected read data as valid and dirty in response to detecting that theread data has the correctable error, and determine a cache evictionevent and write corrected read data to the memory array in response tothe cache eviction event.

In Example 2, the subject matter of Example 1 optionally includes amemory controller is configured to query the memory controller cache inresponse to the memory read request, allocate a cache line for the readdata in response to determining that requested read data is not storedin the memory controller cache and indicate a state of the cache line asinvalid, and change the state of the cache line to valid and dirty inresponse to detecting that the read data has the correctable error.

In Example 3, the subject matter of one or both of Examples 1 and 2optionally includes a memory controller configured to write thecorrected read data to the memory array during a demand scrub operationto correct errors in data stored in the memory array.

In Example 4, the subject matter of one or any combination of Examples1-3 optionally includes a media scheduler configured to send a requestfor the read data to the memory array via a memory array bus, receivethe read data and a stored error correction code for the read data fromthe memory array via the memory array bus, and send the received readdata and error correction code to the memory controller cache. Thememory controller is configured to determine that the read data includesthe correctable error using the error correction code.

In Example 5, the subject matter of Example 4 optionally includes amemory controller configured to send a separate request for the storederror correction code to the memory array via the memory array bus.

In Example 6, the subject matter of one or both of Examples 4 and 5optionally includes a memory controller is configured to store the errorcorrection code for the requested read data at a different address ofthe memory array than the read data.

In Example 7, the subject matter of one or any combination of Examples4-6 optionally includes a memory controller cache including errorcorrection circuitry to correct the error in the read data.

In Example 8, the subject matter of one or any combination of Examples1-7 optionally includes a memory controller included in a first chipletof the memory system and a memory array included in a second chiplet ofthe memory system.

In Example 9, the subject matter of Example 8 optionally includes amemory array that is a NAND flash memory array, and the memory systemfurther includes a Double Data Rate (DDR) interface coupled to thememory controller to access the NAND flash memory array.

Example 10 includes subject matter (such as a method of operating amemory system) or can optionally be combined with one or any combinationof Examples 1-9 to include such subject matter, comprising receiving aread command at a memory controller device of the memory system from asecond device, obtaining data responsive to the read command from amemory array of the memory system according to a memory address andassociated with the read command; detecting that the data has acorrectable error; transmitting corrected data to the second device inresponse to the read command; writing the corrected data in a cache ofthe memory controller device and indicating a state of the correcteddata as valid and dirty in response to detecting that the data has thecorrectable error; and writing the corrected data from the cache to thememory array in response to a cache eviction event.

In Example 11, the subject matter of Example 10 optionally includesquerying the cache of the memory controller device in response to theread command, allocating a cache line for the read data in response todetermining that requested read data is not stored in the memorycontroller cache and indicating a state of the cache line as invalid,and changing the state of the cache line to valid and dirty in responseto detecting that the read data has the correctable error.

In Example 12, the subject matter of one or both of Examples 10 and 11optionally includes writing the corrected read data to the memory arrayduring a demand scrub operation to correct data errors in the memoryarray.

In Example 13, the subject matter of one or any combination of Examples10-12 optionally includes sending the read command to a media schedulerof the memory system; sending, by the media scheduler, a request for thedata responsive to the read command during an available time slot via amemory array bus; receiving, by the media scheduler, the data and astored error correction code for the data from the memory array via thememory array bus; sending, by the media scheduler, the received data anderror correction code to the memory controller cache; and determining,by the memory controller cache, that the data includes the correctableerror using the error correction code.

In Example 14, the subject matter of Example 13 optionally includessending the read command and sending a separate request for the storederror correction code to the memory array via the memory array bus.

In Example 15, the subject matter of one or any combination of Examples10-14 optionally includes storing error correction code for data storedin the memory array at a different address than the data.

Example 16 includes subject matter or can optionally be combined withone or any combination of Examples 1-15 to include such subject matter,such as a computer readable storage medium comprising instructionsconfigured to cause a memory controller of a memory system to decode amemory read request from a requesting device, obtain read data for thememory read request from a memory array of the memory system accordingto a memory address, detect that the read data has a correctable error,return corrected read data to the requesting device in response to thememory request, store the corrected read data in a memory controllercache of the memory system and indicate a state of the corrected readdata as valid and dirty in response to detecting that the read data hasthe correctable error, and determine a cache eviction event and writecorrected read data to the memory array in response to the cacheeviction event.

In Example 17, the subject matter of Example 16 optionally includesinstructions to cause the memory controller to query the memorycontroller cache in response to the memory read request, allocate acache line for the read data when the requested read data is not storedin the memory controller cache and indicate a state of the cache line asinvalid, and change the state of the cache line to valid and dirty inresponse to detecting that the read data has the correctable error.

In Example 18, the subject matter of one or both of Examples 16 and 17optionally includes instructions to cause the memory controller to writethe corrected read data to the memory array during a demand scruboperation that corrects data errors in the memory array.

In Example 19, the subject matter of one or any combination of Examples16-18 optionally includes instructions to send the memory read requestto a media scheduler of the memory system, send a request for the readdata during an available time slot via a memory array bus, receive theread data and a stored error correction code for the read data from thememory array via the memory array bus, send the received read data anderror correction code to the memory controller cache, and determine thatthe read data includes the correctable error using the error correctioncode.

In Example 20, the subject matter of Example 19 optionally includesinstructions to send a separate request via the memory array bus to aseparate address of the memory array for the stored error correctioncode.

These non-limiting Examples can be combined in any permutation orcombination.

1. A memory controller for a memory device, the memory controller comprising: a memory controller cache; and processing circuitry configured to: decode a memory read request received from a separate device; obtain read data for the memory read request from a memory array of memory cells according to a memory address; detect that the read data has a correctable error; return corrected read data to the separate device in response to the memory request; store the corrected read data in the memory controller cache and set metadata for the read data in response to detecting that the read data has the correctable error to a state to cause rewriting of data in response to a cache eviction event; and determine a cache eviction event and write corrected read data to the memory array in response to the cache eviction event.
 2. The memory controller of claim 1, wherein the processing circuitry is configured to set a dirty bit included in the metadata for the memory read request in response to detecting the correctable error in the read data.
 3. The memory controller of claim 1, wherein the memory controller cache is configured to initiate the cache eviction event in response to at least one of a capacity eviction, a flush operation, or an eviction maintenance request.
 4. The memory controller of claim 1, wherein the processing circuitry is configured to write the corrected read data to the memory array as part of a demand scrubbing operation.
 5. The memory controller of claim 1, wherein the processing circuitry is configured to write the corrected read data to the memory array written using a write operation that is not a read-modify-write operation.
 6. The memory controller of claim 1, wherein the processing circuitry is configured to perform a first memory access to obtain the read data from the memory array and perform a second memory access to obtain the error correction code for the read data from the memory array; and wherein the memory controller cache is configured to determine that the read data includes the correctable error using the error correction code.
 7. The memory controller of claim 1, including: a media scheduler configured to: send a request for the read data to the memory array via a memory array bus; receive the read data and a stored error correction code for the read data from the memory array via the memory array bus; and send the received read data and error correction code to the memory controller cache; and wherein the memory controller cache is configured to determine that the read data includes the correctable error using the error correction code.
 8. The memory controller of claim 1, wherein the processing circuitry is configured to: allocate a cache line for the read data in response to determining that requested read data is not stored in the memory controller cache and indicate a state of the cache line as invalid; and change the state of the cache line to valid and dirty in response to detecting that the read data has the correctable error.
 9. The memory controller of claim 1, wherein the memory controller is included in a chiplet, the processing circuitry includes a packet decoder, and the read request is decoded from a packet received via intra-chiplet signaling.
 10. A method of operating a memory controller of a memory system, the method comprising: decoding a memory read request; fetching read data for the memory read request from a memory array of memory cells according to a memory address, wherein the memory array is off-die from the memory controller; detecting that the fetched read data has a correctable error; returning corrected read data for the memory request; storing the corrected read data in a memory controller cache and setting metadata for the read data in response to detecting that the read data has the correctable error, wherein the metadata is set to a state to cause rewriting of data in response to a cache eviction event; and determining a cache eviction event and writing corrected read data to the memory array in response to the cache eviction event.
 11. The method of claim 10, wherein setting metadata includes setting a dirty bit included in the metadata for the memory read request in response to detecting the correctable error in the read data.
 12. The method of claim 10, wherein determining a cache eviction event includes initiating the cache eviction event by the memory controller in response to at least one of a capacity eviction, a flush operation, or an eviction maintenance request.
 13. The method of claim 10, wherein the writing corrected read data includes writing the corrected read data to the memory array as part of a demand scrubbing operation.
 14. The method of claim 10, wherein the writing corrected read data includes writing the corrected read data to the memory array written using a write operation that is not a read-modify-write operation.
 15. The method of claim 10, wherein the fetching read data includes performing a first memory access to obtain the read data from the memory array and performing a second memory access to obtain the error correction code for the read data from the memory array; and wherein determining that the read data includes the correctable error includes detecting the correctable error suing using the error correction code.
 16. The method of claim 10, including: sending a request for the read data to the memory array via a memory array bus of the memory system; receiving the read data and a stored error correction code for the read data from the memory array via the memory array bus; and sending the received read data and error correction code to a memory controller cache of the memory controller; and wherein determining that the read data has the correctable error includes the memory controller cache detecting the correctable error using the error correction code.
 17. The method of claim 10, including: allocating a cache line for the read data in response to determining that requested read data is not stored in a memory controller cache of the memory controller and indicating a state of the cache line as invalid; and changing the state of the cache line to valid and dirty in response to detecting that the read data has the correctable error.
 18. A memory device including: a memory array of memory cells implemented in multiple integrated circuit memory dies; a first chiplet separate from the memory dies and the first chiplet includes a memory controller, and the memory controller includes a memory controller cache; a second chiplet separate from the memory dies and the second chiplet includes a host interface, wherein the second chiplet is configured to receive a read command via the host interface and forward the read command to the memory controller of the first chiplet; and wherein the memory controller is configured to: obtain data responsive to the read command from the memory array according to a memory address and associated with the read command; detect that the data has a correctable error; return corrected data to the second chiplet in response to the read command; write the corrected data in the memory controller cache and set a state of the corrected data in cache as valid and dirty for the read command; and write the corrected data from the memory controller cache to the memory array in response to a cache eviction event.
 19. The memory device of claim 18, wherein the memory controller includes a packet decoder configured to decode the read request from a packet received from the second chiplet via intra-chiplet signaling.
 20. The memory device of claim 18, wherein the memory controller includes a programmable atomic unit (PAU) and the read request is an atomic operation performed using the PAU. 